There are two main types of spatial data: vector and raster data.
Vector: Vectors (also called shapefiles) consist of points, lines and polygons. These shapes are attached to a dataframe, where each row corresponds to a different spatial element.
Raster: Rasters are spatially-referenced grids where each cell has one value.
Shapefile
Raster
Here, we'll use the following packages
# Loading the data hh_data <- read.csv(file.path(finalData,"HH_data.csv")) str(hh_data)
## 'data.frame': 422 obs. of 5 variables: ## $ X : int 1 3 4 5 6 7 8 10 11 12 ... ## $ expend_food_yearly: num 250452 420029 484468 326109 131487 ... ## $ food_security : Factor w/ 3 levels "Little Hunger",..: 1 2 1 1 3 3 1 2 3 1 ... ## $ longitude_scramble: num 30.4 30.4 30.4 30.4 30.4 ... ## $ latitude_scramble : num -1.97 -2 -1.99 -1.98 -1.95 ...
# Plot household coordinates from data set
ggplot() +
geom_point(data = hh_data,
aes(x = longitude_scramble,
y = latitude_scramble),
size = .7)
# Plot household coordinates from data set
ggplot() +
geom_point(data = hh_data,
aes(x = longitude_scramble,
y = latitude_scramble),
size = .7)
coord_map() creates an approximated projection of an areacoord_map() works best for smaller areas closer to the equatorcoord_map()# Plot undistorted household coordinates from data set
ggplot() +
geom_point(data = hh_data,
aes(x = longitude_scramble,
y = latitude_scramble),
size = .7) +
coord_quickmap()
# Plot undistorted household coordinates from data set
ggplot() +
geom_point(data = hh_data,
aes(x = longitude_scramble,
y = latitude_scramble),
size = .7) +
coord_quickmap()
Because we have other information about the households, we can use the map to illustrate the geographic districution of the variables in the data set
This is done using the same arguments as we used for non-spatial data:
# Plot undistorted household coordinates from data set
ggplot() +
geom_point(data = hh_data,
aes(x = longitude_scramble,
y = latitude_scramble,
color = food_security),
size = .7) +
coord_quickmap()
Now let's make a few adjustments to make it better-looking:
# Launch plot device
ggplot() +
geom_point(data = hh_data, # Add points
aes(x = longitude_scramble,
y = latitude_scramble,
color = food_security),
size = .7) +
coord_quickmap() + # Make sure the map isn't distorted
theme_void() + # Remove gray color from background
scale_color_manual(values = c("green", # Use more intuitive colors
"orange",
"red")) +
labs(title="Household Food Security", # Add titles
color="Food Security")
get_map, that loads into R a map from a server such as Google Mapsget_map:location: an address, longitude/latitude pair (in that order), or left/bottom/right/top bounding boxzoom: from 3 (continent) to 21 (building)maptype: the map look# Choose a point to center the map on
centroid <- c(30.46586, -2.014172)
# Get map
basemap <- get_map(location = centroid,
zoom = 11,
maptype = "hybrid")
maptype = "roadmap" maptype = "satellite"
maptype = "toner"
maptype = "watercolor"
To use the basemap on the background of our map, we replace the ggplot() expression in the regular graph with ggmap():
ggmap(basemap) + # Load basemap
geom_point(data = hh_data, # Add points
aes(x = longitude_scramble,
y = latitude_scramble,
color = food_security),
size = .7) +
coord_quickmap() + # Make sure the map isn't distorted
theme_void() + # Remove gray color from background
scale_color_manual(values = c("green", # Set colors so they're more intuitive
"orange",
"red")) +
labs(title="Household Food Security", # Add titles
color="Food Security")
The data we used until now is a data set like any other, with GPS coordinates as variables. We're using the GPS coordinates to create a plot with the same command we used before. However, that's not how spatial data is more commonly organized.
Spatial data usually takes the format of a shapefile. A shapefile has three basic components:
The get_data function from the raster package allows us to download data from GADM (the Database of Global Administrative Areas). Let's download the first an second administrative division for Rwanda:
# First administrative division
rwanda_l1 <- getData('GADM', country='RWA', level=1)
rwanda_l1
## class : SpatialPolygonsDataFrame ## features : 5 ## extent : 28.86171, 30.89907, -2.839973, -1.04745 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 ## variables : 13 ## names : OBJECTID, ID_0, ISO, NAME_0, ID_1, NAME_1, HASC_1, CCN_1, CCA_1, TYPE_1, ENGTYPE_1, NL_NAME_1, VARNAME_1 ## min values : 1, 189, RWA, Rwanda, 1, Amajyaruguru, RW.ES, 1, 1, Province, Province, , Eastern Province|Province de l'Est ## max values : 5, 189, RWA, Rwanda, 5, Umujyi wa Kigali, RW.SU, 5, 5, Province, Province, , Western Province|Province de l'Ouestern Province|Province de l'Ouest
# Second administrative division
rwanda_l2 <- getData('GADM', country='RWA', level=2)
In R, a spatial dataframe is a list, where one element of the list is a dataframe of variables and other is a dataframe of coordinates.
Access a dataframe containing the variables (except the coordinates)
# Access the dataframe containing the database rwanda_l1@data
## 'data.frame': 5 obs. of 13 variables: ## $ OBJECTID : int 1 2 3 4 5 ## $ ID_0 : int 189 189 189 189 189 ## $ ISO : chr "RWA" "RWA" "RWA" "RWA" ... ## $ NAME_0 : chr "Rwanda" "Rwanda" "Rwanda" "Rwanda" ... ## $ ID_1 : int 1 2 3 4 5 ## $ NAME_1 : chr "Amajyaruguru" "Amajyepfo" "Iburasirazuba" "Iburengerazuba" ... ## $ HASC_1 : chr "RW.NO" "RW.SU" "RW.ES" "RW.OU" ... ## $ CCN_1 : int 4 2 5 3 1 ## $ CCA_1 : chr "4" "2" "5" "3" ... ## $ TYPE_1 : chr "Province" "Province" "Province" "Province" ... ## $ ENGTYPE_1: chr "Province" "Province" "Province" "Province" ... ## $ NL_NAME_1: chr "" "" "" "" ... ## $ VARNAME_1: chr "Northern Province|Province du Nord" "Southern Province|Province du Sud" "Eastern Province|Province de l'Est" "Western Province|Province de l'Ouestern Province|Province de l'Ouest" ...
We can quickly plot spatial dataframes using base plot. Note that now the map is not distorted even though we didn't tell it to project the data.
# Quick plot of Rwanda's first administrative division plot(rwanda_l1)
We can quickly plot spatial dataframes.
# Quick plot of Rwanda's first and second administrative divisions plot(rwanda_l1, lwd = 2) plot(rwanda_l2, add = TRUE)
As discussed earlier, base plot is useful to create quick visualizations, but not so much for nice presentations.
However, ggplot can't interpret spatial dataframes. Consequently, we need to transform the data into a format that ggplot can understand. Here, we make a dataframe where each vertice of the polygon is an observation.
# Transform the sshapefile into a data set rwanda_l1_df <- tidy(rwanda_l1, region = "ID_1") rwanda_l2_df <- tidy(rwanda_l2, region = "ID_2") head(rwanda_l2_df)
## long lat order hole piece group id ## 1 29.82406 -1.308745 1 FALSE 1 1.1 1 ## 2 29.82414 -1.308770 2 FALSE 1 1.1 1 ## 3 29.82423 -1.308752 3 FALSE 1 1.1 1 ## 4 29.82433 -1.308763 4 FALSE 1 1.1 1 ## 5 29.82441 -1.308798 5 FALSE 1 1.1 1 ## 6 29.82449 -1.308845 6 FALSE 1 1.1 1
The resulting dataframe from tidy:
id, which is taken from the the variable in the region argument. # Merging spatial data and original data set
rwanda_l1_df <- merge(rwanda_l1_df, rwanda_l1,
by.x="id", by.y="ID_1")
rwanda_l2_df <- merge(rwanda_l2_df, rwanda_l2,
by.x="id", by.y="ID_2")
## id long lat order hole piece ISO ID_1 NAME_1 ## 1 1 29.82406 -1.308745 1 FALSE 1 RWA 1 Amajyaruguru ## 2 1 29.82414 -1.308770 2 FALSE 1 RWA 1 Amajyaruguru ## 3 1 29.82423 -1.308752 3 FALSE 1 RWA 1 Amajyaruguru ## 4 1 29.82433 -1.308763 4 FALSE 1 RWA 1 Amajyaruguru ## 5 1 29.82441 -1.308798 5 FALSE 1 RWA 1 Amajyaruguru ## 6 1 29.82449 -1.308845 6 FALSE 1 RWA 1 Amajyaruguru
Now, let's make a map
# Map first administrative division
ggplot() +
geom_polygon(data = rwanda_l1_df,
aes(x = long,
y = lat,
group = group),
colour = "black") # Set color of line
Now, let's make a map
alpha determines the transparency of the map's fill:ggplot() +
geom_polygon(data = rwanda_l1_df, ## Map first administrative division
aes(x = long,
y = lat,
group = group),
colour = "black", # Set color of line
alpha = 0) # Set transparency of fill
alpha determines the transparence of the map's fill: To make it more interesting, let's also show the second administrative division. We do this by adding a second shapefile to the same plot:
ggplot() + ## Open graph device
geom_polygon(data = rwanda_l2_df, ## Map second administrative division
aes(x = long,
y = lat,
group = group,
fill = id), # One color fill per id
alpha = .3) + # Set transparence of fill
geom_polygon(data = rwanda_l1_df, ## Map first administrative division
aes(x = long,
y = lat,
group = group),
colour = "black", # Set color of line
alpha = 0) # Set transparence of fill
Now let's add another basemap
# Create a bounding box around Rwanda
area <- bbox(rwanda_l1) # bbox() only works with spatial data
# Get the basemap from Google Maps
basemap <- get_map(location = area,
zoom = 8,
maptype = "satellite")
# Create the map
ggmap(basemap) +
geom_polygon(data = rwanda_l2_df, ## Map second administrative division
aes(x = long,
y = lat,
group = group,
fill = id), # One color fill per id
alpha = .3) + # Set transparence of fill
geom_polygon(data = rwanda_l1_df, ## Map first administrative division
aes(x = long,
y = lat,
group = group),
colour = "black", # Set color of line
alpha = 0) # Set transparence of fill
Some final touches to make it prettier:
ggmap(basemap) + ## Add basemap
geom_polygon(data = rwanda_l2_df, ## Map second administrative division
aes(x = long,
y = lat,
group = group,
fill = id), # One color fill per id
alpha = .2) + # Set transparence of fill
geom_polygon(data = rwanda_l1_df, ## Map first administrative division
aes(x = long,
y = lat,
group = group),
colour = "black", # Set color of line
alpha = 0) + # Set transparence of fill
coord_fixed(xlim = c(28.8, 31), # Crop map to remove unnecessary borders
ylim = c(-3,-1)) +
theme_void() + # Remove ticks
theme(legend.position = "none") # Remove legend